fix(ported_static): Approach-1 + stale-skip cleanup for Amsterdam OoG-by-design tests by leolara · Pull Request #2843 · ethereum/execution-specs

leolara · 2026-05-12T14:32:15Z

🗒️ Description

Follow-up to #2839 — applies the Approach-1 / Approach-2 strategies
described in .mb/oog-by-design-amsterdam-approaches.md to the
OoG-by-design subset of Amsterdam-skipped ported_static tests.

Clears 74 skip entries (tests/ported_static/amsterdam_skip_list.txt
897 → 823).

Commits

feat(forks): add Fork.oog_budget_lift helper — new classmethod
on BaseFork that composes sstore_state_gas, create_state_gas,
and code_deposit_state_gas into a single budget lift for tests
calibrated to OoG just below an EIP-8037-affected cost boundary.
Returns 0 on pre-EIP-8037 forks, so callers apply it
unconditionally without a fork guard.
fix(ported_static): Approach-1 fork-conditional OoG lift —
8 single-spillable-op tests where tx_gas[1] is tuned to barely
complete a CREATE / SSTORE / CREATE+SSTOREs on Cancun but OoGs on
Amsterdam due to state-gas spill. Lift the budget with
Fork.oog_budget_lift(...) using the right
(creates, sstores, deploy_code_size) counts:
- stCreate2/test_create2_oo_gafter_init_code.py
- stCreate2/test_create2_oo_gafter_init_code_returndata2.py
- stCreateTest/test_create_oo_gafter_init_code.py
- stCreateTest/test_create_oo_gafter_init_code_returndata2.py
- stRevertTest/test_revert_sub_call_storage_oog.py
- stRevertTest/test_revert_sub_call_storage_oog2.py
- stWalletTest/test_wallet_construction_oog.py
- stWalletTest/test_multi_owned_construction_not_enough_gas_partial.py
chore(ported_static): remove stale Amsterdam skip entries for stSStoreTest 16-pair family —
22 stSStoreTest files (`sstore_0to*`, `sstore_xto*` excluding
`gas`, `gas_left`, `change_from_external_call_in_init_code`) had
3 skip entries each. Re-running these on Amsterdam with the skip
list disabled shows all 60 fixture variants per file already
pass without any code changes — the post-state assertion
(`contract_2: storage={1: 0}`) holds whether OoG fires at pair 14
(Cancun) or pair 5 (Amsterdam after state-gas spill). Remove all
66 stale entries, no test-file changes.
chore: ruff format oog_budget_lift unit test assertions —
cosmetic format pass on the helper's unit test.

Approach 4 (1024-depth CALL family) — deferred

The 3 `call1024_oog` / `callcode1024_oog` files have a per-frame
state-gas interaction that the simple `oog_budget_lift` model
underestimates (depth dropped from 134→44 frames with a 3×SSTORE
lift, vs expected near-zero drop). They need per-frame analysis
beyond the helper's scope. Leaving them skipped; follow-up PR.

Verification

`uv run fill --fork Amsterdam -m "not slow" tests/ported_static/` → 0 failed (16704 passed, 2226 skipped)
`uv run fill -m "not slow" tests/ported_static/` (all forks) → 0 failed (60476 passed)
`uv run pytest packages/testing/.../test_forks.py::test_oog_budget_lift` → 1 passed
`just static` → clean (ruff, mypy, vulture, ethereum-spec-lint, actionlint, codespell)

🔗 Related Issues or PRs

Follows #2839 on the same fork. Independent — can merge in either
order.

✅ Checklist

All: Ran `just static` — clean.
All: PR title follows the repo standard.
All: Considered updating the online docs in ./docs/.
All: Set appropriate labels (only maintainers can apply).
Tests: Ran `mkdocs serve` to verify auto-generated docs.
Tests: post-mortem update (N/A — not implementing a missed test
case).
Ported Tests: `@manually-enhanced` marker added to all eight
edited tests; `@ported_from` preserved from upstream.

codecov · 2026-05-12T14:45:49Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (devnets/bal/7@a3e5201). Learn more about missing BASE report.

Additional details and impacted files

@@               Coverage Diff                @@
##             devnets/bal/7    #2843   +/-   ##
================================================
  Coverage                 ?   87.35%           
================================================
  Files                    ?      586           
  Lines                    ?    35957           
  Branches                 ?     3382           
================================================
  Hits                     ?    31410           
  Misses                   ?     3926           
  Partials                 ?      621

Flag	Coverage Δ
unittests	`87.35% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

OoG-by-design ported_static tests are calibrated to land just below a cost boundary; on Amsterdam each fresh SSTORE-set and CREATE spills its state-gas portion back into regular gas when the per-tx reservoir is empty. Tests that expect "N SSTOREs and M CREATEs complete before OoG" need their gas budget lifted by N * sstore_state_gas + M * create_state_gas to land at the same intermediate state. `Fork.oog_budget_lift(sstores_before_oog=N, creates_before_oog=M)` composes the two existing state-gas helpers and returns 0 on pre-EIP-8037 forks (where both helpers are 0), so callers can apply it unconditionally without a fork guard. Unit-tested on Cancun (zero) and Amsterdam (cumulative spill).

…*/stRevert*/stWallet* tests Eight tests share the OoG-by-design shape `tx_gas = [oog_path, success_path]` where the success_path budget is tuned to barely complete a single CREATE, a CREATE plus a few SSTOREs, or a deploy chain. On Amsterdam EIP-8037 splits NEW_ACCOUNT, fresh SSTORE-set, and code-deposit cost into a regular portion plus a state-gas portion; with an empty reservoir, the state-gas spills back into regular gas and breaks the success_path budget. Apply `Fork.oog_budget_lift` with the right (creates, sstores, deploy_code_size) counts to lift the budget on Amsterdam only. Pre- EIP-8037 forks return 0 from the helper, so the original budget is preserved. Files (skip-list entries cleared): - stCreate2/test_create2_oo_gafter_init_code.py (-g1) - stCreate2/test_create2_oo_gafter_init_code_returndata2.py (-g1) - stCreateTest/test_create_oo_gafter_init_code.py (-g1) - stCreateTest/test_create_oo_gafter_init_code_returndata2.py (-g1) - stRevertTest/test_revert_sub_call_storage_oog.py (-g1-v0) - stRevertTest/test_revert_sub_call_storage_oog2.py (-g1-v0) - stWalletTest/test_wallet_construction_oog.py (-g1) - stWalletTest/test_multi_owned_construction_not_enough_gas_partial.py (-g1) Removes 8 entries from amsterdam_skip_list.txt.

…eTest 16-pair family 22 stSStoreTest files (`sstore_0to*`, `sstore_xto*` excluding `gas`, `gas_left`, `change_from_external_call_in_init_code`) had 3 skip entries each (`d{0,1,2}-g1`). Re-running these on Amsterdam with the skip list disabled shows all 60 fixture variants per file already pass without any code changes. The post-state expectation at `g=1` is `contract_2: storage={1: 0}` (only slot 1, asserted to be zero). On Cancun, ~14 of the 16 SSTORE pairs complete before OoG; on Amsterdam EIP-8037 the state- gas spill cuts that to ~5 pairs. In both cases slot 1 ends at 0 and the final `SSTORE(1, 1)` does not run, so the assertion holds on both forks unchanged. These were defensive skip entries from an earlier snapshot. Remove all 66 entries; no test-file changes needed.

kclowes

This is looking good @leolara! I may just be missing it, but what is .mb/oog-by-design-amsterdam-approaches.md? Also added a comment about adding the lift to both tx_gas values - let me know what you think!

kclowes · 2026-05-15T21:58:36Z

        Bytes(""),
    ]
-    tx_gas = [54000, 55000]
+    tx_gas = [54000, 55000 + fork.oog_budget_lift(creates_before_oog=1)]


Should we apply the lift to both tx_gas values? Claude tells me that: Without the lift, tx_gas[0] on Amsterdam OoGs at CREATE2 dispatch before init code runs — the assertions still pass, but the test no longer exercises the scenario it claims to.

I think my last commit fixes this? Now I see that the first transaction was ooG in the wrong place

kclowes · 2026-05-15T21:59:21Z

 # stripping the fixture-format suffix in conftest.py).
 #
-# Total entries: 554
+# Total entries: 480


…ter_init_code tests Address kclowes's review on ethereum#2843: with only tx_gas[1] lifted, g=0 OoG'd at CREATE/CREATE2 dispatch on Amsterdam (NEW_ACCOUNT state-gas spill) before init code ever ran — the assertion still held (`NONEXISTENT` either way) but the failure mode shifted from "OoG after init code" (the test's named scenario) to "dispatch-time OoG". A clean closed form using `fork.oog_budget_lift(creates_before_oog=1)` (183600) overshoots and pushes g=0 past the deploy threshold. The Cancun 1000-gas gap between g=0 and g=1 collapses on Amsterdam: once dispatch is cleared, the 5-byte init code is cheap enough to always complete. Empirical binary-search on both files puts the safe range at (166499, 167000); 166_750 sits in the middle, keeping g=0 OoG'ing at dispatch and g=1 just clearing the deploy threshold. The two `_returndata2` variants are left unchanged — g=0's post-state happens to land identically on both forks at the existing budget, and adding any lift breaks them.

leolara · 2026-05-22T14:02:46Z

This is looking good @leolara! I may just be missing it, but what is .mb/oog-by-design-amsterdam-approaches.md? Also added a comment about adding the lift to both tx_gas values - let me know what you think!

the directory .mb works as a memory bank of my agent, where we store findings and things that should know in the future.

In this case the content of this file is:

# OoG-by-design ported_static tests — fix approaches for Amsterdam

After PR #2839 there are still **119 OoG-by-design entries** in
`tests/ported_static/amsterdam_skip_list.txt`. These are tests whose
budgets are tuned to land *just below* an EIP-8037-affected cost
boundary (SSTORE-set, CALL+value to empty account, MEMORY expansion,
nested CALL forwarding), so a blind gas bump invalidates their
assertion and a plain skip loses the coverage.

This note enumerates tractable approaches, ordered by risk and
expected entries cleared.

## Approach 1 — extend `tx_gas` with an Amsterdam threshold (lowest risk)

Tests of shape

```python
tx_gas = [happy_path, just_below_threshold]

where the threshold is "intrinsic + 1 spillable op" can be made
fork-conditional in one line:

tx_gas = [800000, 80000]
if fork.is_eip_enabled(8037):
    # Amsterdam: OoG threshold lifts by the SSTORE-set state-gas
    tx_gas = [800000, 80000 + fork.sstore_state_gas()]

Works when the OoG depends on one spillable operation. Pick-up:
~20–30 entries — simpler stSStoreTest, stCreate2/*_oo_gafter_*,
stWalletTest. Each fix is ≤6 lines.

Approach 2 — derive the bump from the expected post-state

For tests that complete N SSTOREs and then OoG (post-state shows
storage={1:1, …, N:1} with entry N+1 missing), the Amsterdam budget
needs + N × fork.sstore_state_gas() headroom to land at the same
intermediate state. A tiny helper centralises the math:

def oog_budget(base: int, *, sstores_before_oog: int, fork: Fork) -> int:
    """Adjust an OoG-tuned gas budget for EIP-8037 state-gas spill."""
    return base + sstores_before_oog * fork.sstore_state_gas()

Per-test usage stays a single line. Pick-up: ~40–50 entries —
stSStoreTest/test_sstore_0to0*, xto_* family, parts of
stStaticCall/test_static_check_opcodes5,
stRevertTest/test_revert_opcode_multiple_sub_calls. Each fix is
≤8 lines but requires reading the expected post-state to count N.

Approach 3 — fork-conditional `expect_section` entries

Tests already using resolve_expect_post can get a ">=Amsterdam"
entry with a smaller intermediate state — accept that the OoG fires
earlier on Amsterdam and the recorded state is different. Keeps the
original gas budget intact.

expect_entries_ = [
    {
        "indexes": {"data": -1, "gas": 1, "value": -1},
        "network": ["Cancun"],
        "result": {target: Account(storage={1: 1, 2: 1, 3: 1})},
    },
    {
        "indexes": {"data": -1, "gas": 1, "value": -1},
        "network": [">=Amsterdam"],
        "result": {target: Account(storage={1: 1})},  # OoGs sooner
    },
]

Trades gas-bump complexity for post-state enumeration. Useful where
multiple d parametrizations share a tx_gas list and you can't
bump it without breaking other entries. Pick-up: ~10–15 entries.

Approach 4 — CALL-stack-depth tests (`*1024_oog` family)

stCallCreateCallCodeTest/test_call1024_oog,
test_callcode1024_oog, stDelegatecallTestHomestead/test_call1024_oog,
test_delegatecall1024_oog recurse 1024 deep and assert OoG when the
call frame hits the depth limit. EIP-8037 doesn't touch stack-depth
mechanics — the failures here are gas exhaustion firing before
depth 1024 because each frame's state-gas spill compounds.

Fix: bump the inner-CALL forwarded gas so each frame survives until
the depth limit fires. Original gas budget stays. Pick-up: ~9 entries.

Approach 5 — accept-and-document the rest

The genuinely stuck cases, with explicit reasons:

Pre-EIP-150 gas-pricing assertions — tests like
stStaticCall/test_static_callcallcodecallcode_011_oogm_after* are
calibrated against gas costs that no longer match production
behaviour. EIP-8037 spec tests already cover the new 2D gas model.
Memory-expansion + state-gas combination — stMemExpandingEIP150Calls/*
hits OoG at exact byte counts where the state-gas reservoir changes
the effective cap; no single-variable fix.
Inverse failures — stEIP3860_limitmeterinitcode/*invalid cases:
Amsterdam allows what was previously rejected; behaviour change, not
a gas issue.

For these, add a one-line rationale next to the skip entry (or a
section header comment in amsterdam_skip_list.txt) and leave them
as documented coverage gaps.

Recommended sequencing

PR-A: Approach 1 + Approach 4 — mechanical, ≤15 lines per file,
~30–40 entries cleared.
PR-B: Approach 2 — introduce the oog_budget helper, sweep
stSStoreTest. ~50 entries cleared. Helper lives in
execution_testing.forks or a per-package utils module.
PR-C: Approach 3 — selective use on the harder multi-d tests
where (2) won't fit. ~10–15 entries.
PR-D: documentation pass — annotate the residue (~15–20
entries) with skip rationale comments so reviewers know they're
intentional gaps, not "yet to be triaged".

Endpoint: ~80 stubbornly-skipped tests, each with a documented
reason — vs. the current 554 with one bucket reason for everything.

Per-test inventory (for the PR-A starting point)

Best Approach-1 candidates (single-SSTORE OoG, tx_gas[1] adjustable):

stSStoreTest/test_sstore_gas.py
stSStoreTest/test_sstore_gas_left.py
stCreate2/test_create2_oo_gafter_init_code.py (g1)
stCreate2/test_create2_oo_gafter_init_code_returndata2.py (g1)
stCreate2/test_create2_oo_gafter_init_code_revert2.py
stCreateTest/test_create_oo_gafter_init_code.py
stCreateTest/test_create_oo_gafter_init_code_returndata2.py
stCreateTest/test_create_oo_gafter_init_code_returndata_size.py
stCreateTest/test_create_oo_gafter_init_code_revert2.py
stWalletTest/test_day_limit_construction_partial.py
stWalletTest/test_wallet_construction_oog.py (g1)
stWalletTest/test_wallet_construction_partial.py
stWalletTest/test_multi_owned_construction_not_enough_gas_partial.py (g1)
stRevertTest/test_revert_sub_call_storage_oog.py (g1)
stRevertTest/test_revert_sub_call_storage_oog2.py (g1)
stMemoryTest/test_oog.py

Best Approach-4 candidates:

stCallCreateCallCodeTest/test_call1024_oog.py (4 entries)
stCallCreateCallCodeTest/test_callcode1024_oog.py (2 entries)
stDelegatecallTestHomestead/test_call1024_oog.py (2 entries)
stDelegatecallTestHomestead/test_delegatecall1024_oog.py (1 entry)

leolara · 2026-05-25T10:24:22Z

@spencer-tb I think the lint problem I am getting is from the base branch, on which branch should I rebase this?

leolara · 2026-05-26T15:14:12Z

`devnets/bal/7` ported_static — progress after PR #2843

Snapshot date: 2026-05-26
Branch: wt-bal-7-amsterdam-oog-by-design (PR #2843)

Remaining after PR #2843 merges

Metric	Count
Skip-list entries (nodeid substrings)	480
Skipped fixture variants on Amsterdam fill	1,197
Passing fixture variants on Amsterdam	17,733
Amsterdam pass rate on `tests/ported_static/`	93.7%

Cumulative reduction across all my PRs

Stage	Skip-list entries	Fixture variants
Before PR #2790 (start of work)	~897	~2,691
After PR #2796 + #2790	~732	~2,196
After PR #2839	554	~1,662
After PR #2843 (this PR)	480	1,197
Net cleared by my work	−417	~−1,494

Notes

Skip-list entries are nodeid substrings; each typically expands to
~2–3 fixture variants (state_test,
blockchain_test_from_state_test,
blockchain_test_engine_from_state_test).
The 1,197 figure is the actual count uv run fill --fork Amsterdam
reports as skipped.
For comparison: issue Static Test Fail Tracker for EIP-8037 #2601 quoted 3,423 failures on the
pre-recalibration eips/amsterdam/eip-8037 branch. The 1,197 number
on bal/7 is ~65% lower despite the CPSB-1530 recalibration adding
new failure modes that didn't exist at the time Static Test Fail Tracker for EIP-8037 #2601 was filed.
Classification of the remaining 480 entries (OoG-by-design,
gas-measurement, bytecode-baked, multi-param-per-d, balance-refund,
size-limit, other) is in
bal-7-skipped-tests-classification.md.

…terdam-oog-by-design

… → devnets/bal/7 merge The May-18 merge `dffc4cfea` ("Merge remote-tracking branch 'upstream/forks/amsterdam' into devnets/bal/7") had two conflict resolutions that left `bal/7` in a state where `just static` fails: 1. `src/ethereum/forks/amsterdam/blocks.py` ended up with the EIP-7843 `slot_number: U64` field declared twice (lines 260 and 268). `mypy` rejects it with `[no-redef]`; `ethereum-spec-lint` crashes with `ValueError: duplicate path Header.slot_number`. Remove the second copy. 2. `BuiltBlock.derive_engine_payload_modifier` was dropped from `packages/testing/src/execution_testing/specs/blockchain.py` while its 5 call sites in `specs/tests/test_types.py` were kept. `mypy` reports 5 `attr-defined` errors. Restore the staticmethod (and the `FixtureExecutionPayloadModifier` import it needs) from `forks/amsterdam`. The instance-level wiring on `forks/amsterdam` (`BuiltBlock.rlp_modifier` field, constructor pass-through, and `get_fixture_engine_new_payload` call) is **not** restored — neither the linter nor the test file references it, and `bal/7`'s current `get_fixture_engine_new_payload` already runs without it. This unblocks CI on every open PR against `devnets/bal/7` (including this one).

leolara · 2026-05-26T15:29:27Z

@spencer-tb This commit here: 615745c is fixing something unrelated to this PR, but an error in bal/7 branch, that is causing the automatic tests to don't pass

leolara · 2026-05-28T10:18:25Z

@kclowes could you please check again?

leolara requested review from kclowes, marioevz and spencer-tb May 13, 2026 10:10

leolara added 3 commits May 14, 2026 16:31

leolara force-pushed the wt-bal-7-amsterdam-oog-by-design branch from a623fa1 to 1ad2430 Compare May 14, 2026 11:54

kclowes reviewed May 15, 2026

View reviewed changes

leolara mentioned this pull request May 18, 2026

Static Test Fail Tracker for EIP-8037 #2601

Open

leolara added 2 commits May 26, 2026 22:15

Merge remote-tracking branch 'origin/devnets/bal/7' into wt-bal-7-ams…

0669749

…terdam-oog-by-design

leolara mentioned this pull request May 28, 2026

feat(spec-specs, tests): alt to merge 8037 to forks/amsterdam #2901

Open

4 tasks

leolara requested a review from kclowes May 28, 2026 10:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ported_static): Approach-1 + stale-skip cleanup for Amsterdam OoG-by-design tests#2843

fix(ported_static): Approach-1 + stale-skip cleanup for Amsterdam OoG-by-design tests#2843
leolara wants to merge 6 commits into
ethereum:devnets/bal/7from
leolara:wt-bal-7-amsterdam-oog-by-design

leolara commented May 12, 2026

Uh oh!

codecov Bot commented May 12, 2026 •

edited

Loading

Uh oh!

kclowes left a comment

Uh oh!

kclowes May 15, 2026

Uh oh!

leolara May 22, 2026

Uh oh!

kclowes May 15, 2026

Uh oh!

leolara commented May 22, 2026

Uh oh!

leolara commented May 25, 2026

Uh oh!

leolara commented May 26, 2026

Uh oh!

leolara commented May 26, 2026

Uh oh!

leolara commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

leolara commented May 12, 2026

🗒️ Description

Commits

Approach 4 (1024-depth CALL family) — deferred

Verification

🔗 Related Issues or PRs

✅ Checklist

Uh oh!

codecov Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kclowes left a comment

Choose a reason for hiding this comment

Uh oh!

kclowes May 15, 2026

Choose a reason for hiding this comment

Uh oh!

leolara May 22, 2026

Choose a reason for hiding this comment

Uh oh!

kclowes May 15, 2026

Choose a reason for hiding this comment

Uh oh!

leolara commented May 22, 2026

Approach 2 — derive the bump from the expected post-state

Approach 3 — fork-conditional expect_section entries

Approach 4 — CALL-stack-depth tests (*1024_oog family)

Approach 5 — accept-and-document the rest

Recommended sequencing

Per-test inventory (for the PR-A starting point)

Uh oh!

leolara commented May 25, 2026

Uh oh!

leolara commented May 26, 2026

devnets/bal/7 ported_static — progress after PR #2843

Remaining after PR #2843 merges

Cumulative reduction across all my PRs

Notes

Uh oh!

leolara commented May 26, 2026

Uh oh!

leolara commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 12, 2026 •

edited

Loading

Approach 3 — fork-conditional `expect_section` entries

Approach 4 — CALL-stack-depth tests (`*1024_oog` family)

`devnets/bal/7` ported_static — progress after PR #2843